4.2.2 Anchors in the Region Proposal Network (RPN)
The original paper suggests Anchor scales of 32, 64, 128, 256, and 512. These are the lengths of square anchor
side in pixels. However, since Cell nuclei are much smaller than the object that the Mask R-CNN in the original
paper was trying to detect, in this project, I am using Anchor scales of 4, 8, 16, 128, and 256 instead. This is
because in a preliminary study I realized that most of the nuclei are in the 8 pixel by 8 pixel range with some as
big as 100*100 pixel range. This is why I keep the first smallest scales (4, 8, and 16) but then jump to 128*128
pixels. In other words, there are very few nuclei in our dataset that are bigger than 16 by 16 and smaller than
128 by 128 pixels.
I use the same Aspect Ratios for Anchors as suggested in the paper (0.5, 1, and 2). For the Anchor stride, I use a
value of one, which produces 1 Anchor for every position (i.e. Pixel) in the backbone feature map. For the
strides of the FPN pyramid, I use the same values suggested in the paper (4, 8, 16, 32, 64) since these values
were optimized for ResNet101 backbone architecture that I am using. Using above values, the lowest level of
the pyramid has a stride of 4px relative to the image, so anchors are created at every 4 pixel intervals.
For the threshold of the none-max suppression stage, the original paper suggests 0.3 so that they can generate a
large number of proposals. However, I realized that a higher value of 0.7 or 0.8 can increase the performance on
the test set by having stricter criteria for proposals. For training, I use 256 Anchors for each image.
4.2.3 Training and Testing Regions of Interest (ROIs)
After None Max Suppression, I am keeping 2000 ROIs for training and 1000 ROIs during test time. The original
paper generates 1000 ROI per image (initially), which I have reduced to 600 here. This is because the paper
mentions that for the best results the sampling stage should pick up approximately 33% positive ROIs. Since my
images are smaller and have fewer objects compared to the task for which Mask R-CNN was used in the original
paper, I generate fewer ROIs per image initially, so that after down sampling I am left with 33% positive ROIs.
For the number of ROIs per image to feed to the classifier and mask heads, the original paper suggests 512.
However, I am using 200. Again, this is because when using 512, RPN generates too many positive proposals. So,
I am generating fewer proposals to begin with in order to keep the positive to negative ROI ratio at 1/3 as per
the paper’s suggestion. The criteria for deciding between positive and negative ROIs are the same described in
the Methodology section (and in the original paper). Also, for detection, I have set the minimum probability
value to accept a detected instance to 0.7. Basically, ROIs with confidence levels below 0.7 are ignored during
test time.
In the original Mask R-CNN paper, the maximum number of instances to use in training and to return in testing
must be specified. The original paper uses a value of 100 instances per image in both; meaning regardless of
how many instances there are in the image, we return only 100 objects. However, in this project, some images
have up to 500 nuclei in them. Unfortunately, I cannot set the maximum number of ground truth instances to
512 during training due to memory limits. Consequently, I use a maximum of 256 ground truth instances for
training (i.e. allowable objects in each image during training). For test time, however, I have set this number to
512 so that I can correctly classify images with large number of cell nuclei in them.
Finally, for the average RGB pixel values to differentiate different colors, I use the same values as suggested in
the paper (123.7, 116.8, 103.9). At a future time, I might change these values to better reflect the colors present